{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Files (JSON & CSV) — Class Notes\n", "\n", "## Try me\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ffraile/computer_science_tutorials/blob/main/source/Data%20Manipulation/class%20notes/files_class_notes_notebook.ipynb)[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ffraile/computer_science_tutorials/main?labpath=source%2FData%20Manipulation%2Fclass%20notes%2Ffiles_class_notes_notebook.ipynb)" ], "id": "61036bb3282edcdf" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Introduction\n", "### Motivation\n", "- Understand how code reads, writes, and interacts with data files.\n", "- From basic text files to formats like JSON and CSV, designed and optimized for data exchange (readable by both humans and machines).\n", "- Crucial for many applications: everything in computing comes back to storage" ], "id": "51cbb067c22eac61" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Objectives\n", "- Understand file handling basics in Python (opening, reading, writing, closing).\n", "- Understand the structure and use cases of JSON and CSV formats.\n", "- Learn how to read and write JSON and CSV files using Python's built-in libraries." ], "id": "5e0fd39e0337b337" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Intro for non-programmers (I)\n", "- Fundamentally, a file is just a collection of bytes stored on a disk (waiting for us to give them meaning).\n", "- Files can store different types of data: text, images, videos, etc, and information needs to be transformed into bytes (that is, information needs to be encoded).\n", "- UTF-8 is the most common encoding for text files (just a set of rules to dictate how bytes and back).\n", "- Line breaks and the end of the file are encoded as special characters (e.g., `\\n` for new line, ```EOF``` for end of file).\n", "- When programs read and write files, they are just reading a sequence of bytes, until the hit that special ```EOF``` character." ], "id": "7100a91779706ae9" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Intro for non-programmers (II)\n", "- A file system is just a way to organize and store files on a disk (folders/directories, paths, etc).\n", "- The file path is just the location of a file in the file system (e.g., `C:\\Users\\Alice\\Documents\\file.txt` in Windows, or `/home/alice/documents/file.txt` in Linux/Mac).\n", "- Programs need to know the file path to read or write a file.\n", "- Relative paths are relative to the current working directory where your script (a relative path is telling the program \"look for the file from exactly where we are).\n", "- Absolute paths go all the way back to the root of the file system.\n", "\n", "### Agenda\n", "- Intro and agenda (15 min)\n", "- File handling basics (15 min)\n", "- JSON format and handling (15 min)\n", "- CSV format and handling (20 min)\n", "- Code cards (5 min)\n", "- wrap-up (5 min)\n", "- Hand-on assignment (30 min)\n" ], "id": "1b138c055ac0f146" }, { "cell_type": "markdown", "metadata": {}, "source": "## A0) Setup (helpers)", "id": "22c1d2621f2ac61b" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from io import StringIO\n", "import json, csv" ], "id": "f366537f736b3251" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A0.1) File fundamentals\n", "- Text files: UTF-8 encoding (default in Python 3).\n", "- Open a file with `open(filename, mode)`.\n", "- ```filename``: string with the file path.\n", "- ```mode```: string with the mode to open the file (safety mechanism, your way to tell the operating system what you are planning to do).\n", "\n", "| Character | Meaning |\n", "|-----------|-----------------------------------------------------------------|\n", "| 'r' | open for reading (default) |\n", "| 'w' | open for writing, truncating the file first |\n", "| 'x' | open for exclusive creation, failing if the file already exists |\n", "| 'a' | open for writing, appending to the end of file if it exists |\n", "| 'b' | binary mode |\n", "| '+' | open for updating (reading and writing) |\n", "\n" ], "id": "1182dd581a08a803" }, { "metadata": {}, "cell_type": "markdown", "source": [ "**Python Mechanicss**\n", "- ```open``` function is like a gateway to the file. It returns a **file object** that you can use to read from or write to the file.\n", "- Mechanics are very elegant:\n", " - ```file.write(string)``` to write text strings to a file.\n", " - ```file.readline()```reads a single line from the file. If used on a loop, you will know when the file ends when it returns an empty string.\n", " - ```file.readlines()``` reads all lines from the file and returns them as a list\n", " - If you do not use ```file.close()```, the file will remain open, blocking other programs from accessing it, and potentially causing data loss. You must use it always unless...\n", " - You use ```with open(...) as file:```, which automatically closes the file when you exit the block." ], "id": "9e5fcb1baea76e3c" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# Example 1: Read input from user and write to a file\n", "with open(\"example.txt\", 'a') as f:\n", " while True:\n", " line = input(\"Write something to append to the list or click Enter to exit\")\n", " if line:\n", " f.write(line + \"\\n\") # Append newline (\"\\n\") is the new line character\n", " else:\n", " break\n", "\n", "# What happens if you already have example.txt? Try changing the mode to 'w', 'x' or 'a'!" ], "id": "6116130d4d8ef13a" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## How to find files in your file system (Colab/Local)\n", "- In Colab, click the folder icon 📁 on the left panel to open the file explorer:\n", "\n", "![Image showing Colab file explorer](../tutorials/img/colabs_import.png)\n", "\n", "- In local, you will find files in the directory where you started your Python script.\n", "- If you want to write files to a specific directory, you need to provide the full or relative path (check the tutorial)." ], "id": "515f59f71d3829a0" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# Example 2: Read lines from a file\n", "with open(\"example.txt\", \"r\", encoding=\"utf-8\") as f:\n", " lines = f.readlines()\n", " for line in lines:\n", " print(line)" ], "id": "3d36a275cdfc85af" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## A1) JSON fundamentals\n", "- Structured format that pretty much every programming language can understand.\n", "- S in JSON stands for serialization (transformming an object into a format that can be stored or transmiited)\n", "- This makes JSON great for data exchange between different systems, plus it's human-readable.\n", "- As a Python developer, think of a JSON file as a nested combination of dicts and lists.\n", "- Same notation as Python dicts/lists, (double quotes for strings).\n", "- `json.dumps`/`json.dump` and `json.loads`/`json.load`." ], "id": "bb2a0f8387e16d23" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# Example 3: Write and read JSON files\n", "student = {\n", " \"id\": 101,\n", " \"name\": \"Peter Parker\",\n", " \"email\": \"pete@oscorp.com\",\n", " \"enrolled\": True,\n", " \"courses\": [{\"code\": \"CS101\", \"grade\": 9.5}, {\"code\": \"CS102\", \"grade\": 8.75}],\n", " \"note\": \"Uses unicode: café ☕\"\n", "}\n", "student_file = open(\"student_101.json\", \"w\") # Opens file for writing\n", "json.dump(student, student_file)\n", "student_file.close()\n", "## Check file content in file system (colab icon on left panel)\n" ], "id": "fb1ebedf2e3edf79" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# Example 4: Read JSON file\n", "with open(\"student_101.json\", \"r\") as student_file:\n", " loaded_student = json.load(student_file)\n", " if loaded_student[\"courses\"][0][\"code\"] == \"CS101\":\n", " print(\"Loaded OK.\")" ], "id": "5898b788d5397cc7" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A2) CSV fundamentals\n", "- Comma-Separated Values (CSV): text (UTF-8) for tables (rows, columns).\n", "- Each row is a line; each cell separated by commas (or other delimiter)\n", "- TAB delimiter `\\t` is also common (TSV files): Really handy (copy and paste from spreadsheets).\n", "- Example:\n", "```csv\n", "DATE, TIME, TEMPERATURE, HUMIDITY\n", "2022-08-31, 00:15, 25.5, 65\n", "2022-08-31, 00:30, 25.7, 66\n", "2022-08-31, 00:45, 25.9, 67\n", "2022-08-31, 01:00, 25.7, 66\n", "2022-08-31, 01:15, 25.5, 65\n", "```\n", "- Use Python's built-in `csv` module to read and write CSV files.\n", "- Hides the complexity of commas in text, quoting, etc.\n", "- Important: Use `newline=''` when writing CSV files to avoid extra blank lines on some platforms (Windows).\n" ], "id": "a015f427a928f26e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rows = [\n", " [\"id\",\"name\",\"comment\",\"score\"],\n", " [1, \"Alan Turing\", \"loves, commas\", 10],\n", " [2, \"Grace Hopper\", \"quotes \\\"are\\\" fine\", 9.5],\n", "]\n", "\n", "with open(\"CS_101.csv\", \"w\", newline='', encoding=\"utf-8\") as csvfile:\n", " writer = csv.writer(csvfile) # Default delimiter is comma, use delimiter=';' for semicolon or '\\t' for tab\n", " writer.writerows(rows)\n" ], "id": "64c99bfd1a4e4ee5" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with open(\"CS_101.csv\", \"r\") as csvfile:\n", " reader = csv.reader(csvfile)\n", " for row in reader:\n", " print(row)" ], "id": "610bed978a7f2420" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Code Cards\n", "Card F1 - Modes\n", "Find the bug in this code:\n", "```python\n", "with open(\"data.txt\", \"r\") as f:\n", " f.write(\"Hello, World!\")\n", "```" ], "id": "358056ffb632ce5a" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## Card F2 - Predict the output\n", "Given the following CSV file content ```people.csv```:\n", "```csv\n", "name,age,city\n", "Marc,30,New York\n", "Eve,25,Los Angeles\n", "```\n", "What is the output of this code?\n", "```python\n", "with open(\"people.csv\", \"r\") as csvfile:\n", " reader = csv.reader(csvfile)\n", " for row in reader:\n", " if (row[\"name\"] == \"Eve\"):\n", " print(row[\"age\"])\n", "```" ], "id": "5d5ebf4a29016c2e" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Code card J1 - Find the bug\n", "What is wrong with this JSON string?\n", "```json\n", "{\"name\": \"Alan Turing\",\n", "\"age\": 41,\n", "}```\n", "\n" ], "id": "a79036f405e5da73" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Card J2 - Non-serializable object\n", "What is wrong with this code?\n", "\n", "```python\n", "data = {\"s\": {1,2,3}}\n", "data_file = open(\"data.json\", \"w\")\n", "json.dump(data, data_file)\n", "```" ], "id": "d5a6fecbce7a2271" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "data = {\"s\": {1,2,3}}\n", "json_str = json.dumps(data)" ], "id": "439ade376804430a" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### Card J3 - Predict the result\n", "What is the output of this code?\n", "```python\n", "data = {\"name\": \"Peter Parker\", \"age\": 21, \"id\": \"S435B\", \"courses\": [\"CS101\", \"CS102\"]}\n", "data_file = open(\"data.json\", \"w\")\n", "json.dump(data, data_file)\n", "data_file.close()\n", "with open(\"data.json\", \"r\") as f:\n", " loaded_data = json.load(f)\n", " print(loaded_data[\"courses\"][1])\n", "```\n" ], "id": "47074ab0154b9535" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Takeaways\n", "- **JSON**: great for nested data; ensure valid JSON (no comments/trailing commas); control with `indent`, `ensure_ascii`.\n", "- **CSV**: plain tabular text; be explicit with delimiter/quoting; watch commas in text; use `newline=''` on write.\n" ], "id": "e4711d29fd920e58" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.x" } }, "nbformat": 4, "nbformat_minor": 5 }